Correlation Does Not Imply Compensation

Complexity and Irregularity in the Lexicon

Amanda Doucette

McGill University

they/them

Ryan Cotterell

ETH Zurich

he/him

Morgan Sonderegger

McGill University

he/him

Timothy J. O’Donnell

McGill University, Mila, Canada CIFAR AI Chair

he/him

SCiL 2024

June 28, 2024

The compensation hypothesis

As a language increases in complexity in one area, another must decrease in complexity to compensate.

Objective measurement is difficult, but impressionistically it would seem that the total grammatical complexity of any language, counting both morphology and syntax, is about the same as that of any other. This is not surprising, since all languages have about equally complex jobs to do, and what is not done morphologically has to be done syntactically.

– Hockett 1958, “A Course in Modern Linguistics”

Identifying complexity trade-offs

  • While measuring the total complexity of a language is difficult, we can estimate the complexity of some aspects.1
  • Many studies have looked for trade-offs between different measures of linguistic complexity.
  • Typically, this is done through pairwise correlation or regression analysis.

A trade-off between phonotactic complexity and morphological irregularity?

  • It has been proposed that there is a trade-off between phonotactic complexity and morphological irregularity1:
    • Phonotactically complex words are morphologically regular.
    • Morphologically irregular words have simple phonotactics.

…But what if these are both correlated with something else, like frequency?

How do we define a trade-off when more than two variables are involved?

A trade-off between phonotactic complexity and morphological irregularity?

  • Evidence of this trade-off is limited to individual languages (mostly English) and generally small sets of words.

Our question: Does a trade-off between phonotactic complexity and morphological irregularity exist in a larger set of languages? Is it universal?

Correlation does not imply compensation causation implies compensation!

Correlation does not imply compensation causation implies compensation!

Correlation does not imply compensation causation implies compensation!

the problem with correlation

Imagine a language where word length is a mediator in the relationship between frequency and phonotactic complexity.

\[ \begin{align} FR &\sim \mathcal{N}(\mu = 2, \sigma = 1)\\ WL &\sim FR + \mathcal{N}(\mu = 0, \sigma = .2)\\ PC &\sim WL + \mathcal{N}(\mu = 0, \sigma = .2)\\ \end{align} \]

\(\rho\) = 0.9814317

\(\rho\) = 0.9642839

\(\rho\) = 0.9829583

…and the problem with regression

If frequency and phonotactic complexity are common causes1 of word length, how does frequency effect phonotactic complexity?

\[ \begin{align} FR &\sim \mathcal{N}(\mu = 2, \sigma = 1)\\ PC &\sim \mathcal{N}(\mu = 0, \sigma = .2)\\ WL &\sim FR + PC + \mathcal{N}(\mu = 0, \sigma = .2)\\ \end{align} \]

  1. PC ~ FR + WL
Pred. Est. CI p
(Intercept) 0.00 -0.01 – 0.02 0.652
FR -0.50 -0.53 – -0.47 <0.001
WL 0.50 0.46 – 0.53 <0.001

FR has a negative effect on PC, even though PC is genrated independently of FR.

  1. PC ~ FR
Pred. Est. CI p
(Intercept) 0.01 -0.02 – 0.04 0.470
FR -0.00 -0.02 – 0.01 0.568

FR has 0 effect on PC, as expected!

Background

What else affects phonotactic complexity and morphological irregularity?

  • We want to determine whether or not there is a direct causal link between phonotactic complexity and morphological irregularity.
  • To do this, we need to know: What other factors could affect their relationship?
  • We don’t know the underlying causal structure, but we can look at previous work on correlations with other lexical variables.

Phonotactic Complexity and Morphological Irregularity

Phonotactic Complexity and Word Length

Word Length and Frequency

Morphological Irregularity and Frequency

Phonotactic Complexity and Frequency

Morphological Irregularity and Word Length

Methods

Data: 25 languages

Morphology: UniMorph (Batsuren et al. 2022)

  • 182 languages, each word annotated with lemma, inflected form, and a set of morphological features
    • i.e. walk, walked, [verb, singular, past]
  • No IPA transcriptions!
    • Epitran grapheme-to-phoneme models to convert orthography to IPA, and exclude languages without models (Mortensen et al. 2018)

Phonotactics: NorthEuraLex
(Dellert et al. 2020)

  • Phonetic transcriptions of 1,016 basic concepts for 107 Northern Eurasian languages
  • Mainly morphologically simple words

Phonotactics: WikiPron (Lee et al. 2020)

  • Wiktionary pronunciation dictionaries
  • Used for languages not included in NorthEuraLex

Frequency Data: Wikipedia

  • Log count per million words, zero frequency forms excluded
  • Retrieved from full Wikipedia dump for each language

Quantifying Morphological Irregularity

Wu, Cotterell, & O’Donnell 2019

  • People have gradient intuitions of morphological irregularity
    • walked is more regular than sang which is more regular than went
  • We need a continuous measure of morphological irregularity that is:
    • more positive when an inflected form is less predictable given rest of language
    • more negative when the inflected form is more predictable
  • Estimated using a neural language model

Quantifying Morphological Irregularity

Morphological Irregularity of a Word

\[ \text{MI}(w, \ell, \sigma) = -\log \frac{p( {\color{tr}\overbrace{\vphantom{(A)}{{\color{black}w}}}} \mid {\color{tr}\overbrace{\vphantom{(A)}{{\color{black}\ell}}}}, {\color{tr}\overbrace{\vphantom{(A)}{{\color{black}\sigma}}}}, {\color{tr}\overbrace{\vphantom{(A)}{{\color{black}\mathcal{L}_{-\boldsymbol\ell}}}}} )} { 1 - p(w\mid \ell, \sigma, \mathcal{L}_{-\boldsymbol\ell}) } \]

\[ \text{MI}(w, \ell, \sigma) = -\log \frac{p( {\color{mathred}\overbrace{\vphantom{(A)}{\color{mathred}w}}} \mid {\color{tr}\overbrace{\vphantom{(A)}{{\color{black}\ell}}}}, {\color{tr}\overbrace{\vphantom{(A)}{{\color{black}\sigma}}}}, {\color{tr}\overbrace{\vphantom{(A)}{{\color{black}\mathcal{L}_{-\boldsymbol\ell}}}}} )} { 1 - p(w\mid \ell, \sigma, \mathcal{L}_{-\boldsymbol\ell}) } \] word

\[ \text{MI}(w, \ell, \sigma) = -\log \frac{p( {\color{tr}\overbrace{\vphantom{(A)}{{\color{black}w}}}} \mid {\color{mathred}\overbrace{\vphantom{(A)}{\color{mathred}\ell}}}, {\color{tr}\overbrace{\vphantom{(A)}{{\color{black}\sigma}}}}, {\color{tr}\overbrace{\vphantom{(A)}{{\color{black}\mathcal{L}_{-\boldsymbol\ell}}}}} )} { 1 - p(w\mid \ell, \sigma, \mathcal{L}_{-\boldsymbol\ell}) } \] lemma

\[ \text{MI}(w, \ell, \sigma) = -\log \frac{p( {\color{tr}\overbrace{\vphantom{(A)}{{\color{black}w}}}} \mid {\color{tr}\overbrace{\vphantom{(A)}{{\color{black}\ell}}}}, {\color{mathred}\overbrace{\vphantom{(A)}{\color{mathred}\sigma}}}, {\color{tr}\overbrace{\vphantom{(A)}{{\color{black}\mathcal{L}_{-\boldsymbol\ell}}}}} )} { 1 - p(w\mid \ell, \sigma, \mathcal{L}_{-\boldsymbol\ell}) } \] slot (i.e. PAST, SINGULAR)

\[ \text{MI}(w, \ell, \sigma) = -\log \frac{p( {\color{tr}\overbrace{\vphantom{(A)}{{\color{black}w}}}} \mid {\color{tr}\overbrace{\vphantom{(A)}{{\color{black}\ell}}}}, {\color{tr}\overbrace{\vphantom{(A)}{{\color{black}\sigma}}}}, {\color{mathred}\overbrace{\vphantom{(A)}{\color{mathred}\mathcal{L}_{-\boldsymbol\ell}}}} )} { 1 - p(w\mid \ell, \sigma, \mathcal{L}_{-\boldsymbol\ell}) } \] lexicon (with target lemma removed)

\[ \text{MI}(w, \ell, \sigma) = -\log \frac{p( {\color{tr}\overbrace{\vphantom{(A)}{{\color{black}w}}}} \mid {\color{tr}\overbrace{\vphantom{(A)}{{\color{black}\ell}}}}, {\color{tr}\overbrace{\vphantom{(A)}{{\color{black}\sigma}}}}, {\color{tr}\overbrace{\vphantom{(A)}{{\color{black}\mathcal{L}_{-\boldsymbol\ell}}}}} )} { 1 - p(w\mid \ell, \sigma, \mathcal{L}_{-\boldsymbol\ell}) } \]

Example: \(w=\) walk, \(\ell=\) Walk, \(\sigma=\) [singular, past], \(\mathcal{L}_{-\boldsymbol\ell} =\) English-Walk

Morphological Irregularity of a Lemma

\[ \text{MI}(\ell) = {\color{tr}\underbrace{{\color{black}\frac{1}{|\mathcal{S}|}}}} {\color{tr}\overbrace{\vphantom{(A)}{{\color{black}{\sum_{\sigma\in \mathcal{S}} \text{MI}({\color{tr}\underbrace{{\color{black}\iota(\ell, \sigma)}}}, \ell, \sigma)}}}}} \]

\[ \text{MI}(\ell) = {\color{mathred}\underbrace{{\color{mathred}\frac{1}{|\mathcal{S}|}}}} {\color{tr}\overbrace{\vphantom{(A)}{{\color{black}\sum_{\sigma\in \mathcal{S}} \text{MI}({\color{tr}\underbrace{{\color{black}\iota(\ell, \sigma)}}}, \ell, \sigma)}}}} \] number of inflected forms associated with \(\ell\)

\[ \text{MI}(\ell) = {\color{tr}\underbrace{{\color{black}\frac{1}{|\mathcal{S}|}}}} {\color{mathred}\overbrace{\vphantom{(A)}{\color{mathred}\sum_{\sigma\in \mathcal{S}} \text{MI}({\color{tr}\underbrace{{\color{black}{\color{mathred}\iota(\ell, \sigma)}}}}, \ell, \sigma)}}} \] sum of \(\text{MI}\) of inflected forms associated with \(\ell\)

\[ \text{MI}(\ell) = {\color{tr}\underbrace{{\color{black}\frac{1}{|\mathcal{S}|}}}} {\color{tr}\overbrace{\vphantom{(A)}{{\color{black}\sum_{\sigma\in \mathcal{S}} \text{MI}({\color{mathred}\underbrace{{\color{mathred}\iota(\ell, \sigma)}}}, \ell, \sigma)}}}} \] lemma inflected with morphological features \(\sigma\) (a word \(w\))

Quantifying Phonotactic Complexity

Pimentel, Roark & Cotterell 2020

  • On average, how predictable is each phoneme in a word given the previous phonemes?
    • Bits per phoneme, analogous to surprisal
  • Estimated with a character-level LSTM language model

\[ \text{PC}(w) = - \frac{\log p({\color{tr}\overbrace{\vphantom{(A)}{{\color{black}w}}}} \mid {\color{tr}\overbrace{\vphantom{(A)}{{\color{black}\mathcal{L}_{-w}}}}})}{{\color{tr}\underbrace{{\color{black}|w|}}}} \]

\[ \text{PC}(w) = - \frac{\log p({\color{mathred}\overbrace{\vphantom{(A)}{\color{mathred}w}}} \mid {\color{tr}\overbrace{\vphantom{(A)}{{\color{black}\mathcal{L}_{-w}}}}})}{{\color{tr}\underbrace{{\color{black}|w|}}}} \] word

\[ \text{PC}(w) = - \frac{\log p({\color{tr}\overbrace{\vphantom{(A)}{{\color{black}w}}}} \mid {\color{mathred}\overbrace{\vphantom{(A)}{\color{mathred}\mathcal{L}_{-w}}}})}{{\color{tr}\underbrace{{\color{black}|w|}}}} \] lexicon (with target word removed)

\[ \text{PC}(w) = - \frac{\log p({\color{tr}\overbrace{\vphantom{(A)}{{\color{black}w}}}} \mid {\color{tr}\overbrace{\vphantom{(A)}{{\color{black}\mathcal{L}_{-w}}}}})}{{\color{mathred}\underbrace{{\color{mathred}|w|}}}} \] length of word in phones

Regression Analysis

For each pair of variables, is there a relationship after controlling for variables noted in previous work?

Within-languages:
Linear regression

Is there a relationship within an individual language?

  • Separate regression per language
  • All pairs of variables
  • Control for variables noted in previous slides

Across-languages:
Linear mixed effects regression

Are languages that are more complex in one way less complex in another?

  • Random intercepts and slopes for language effect
  • Frequency excluded
  • Means of dependent and control variables included as additional predictors

Results

Phonotactic Complexity and Morphological Irregularity

Within Language

MI ~ PC + FR + WL

Phonotactic Complexity and Morphological Irregularity

Across Language

MI ~ PC + FR + WL + mean(PC) + mean(WL) + (1 + PC + FR + WL | language)

Phonotactic Complexity and Length

Within Language

PC ~ WL

Phonotactic Complexity and Length

Across Language

PC ~ WL + mean(WL) + (1 + WL | language)

Morphological Irregularity and Frequency

MI ~ FR

Phonotactic Complexity and Frequency

PC ~ FR + WL

Morphological Irregularity and Length

Within Language

MI ~ PC + FR

Morphological Irregularity and Length

Across Language

MI ~ WL + FR + mean(WL) + (1 + FR + WL | language)

Word Length and Frequency

WL ~ FR

Conclusions &
Future Work

Summary: Within languages

Summary: Across languages

Discussion

  • Both within and across languages, morphological irregularity, phonotactic complexity, word length, and frequency clearly influence each other in some way.
  • Previous work examining only pairwise relationships generally concludes that there is strong support for a relationship.
  • By considering a larger set of variables, our results show that these relationships may not be as strongly supported as previously thought.

A need for causal models

  • Depending on what variables are controlled for, and the properties of a language sample, results can change drastically.
  • To describe compensation relationships in a set of highly correlated variables, we need to understand the causal structure of the data.
  • If we could directly manipulate properties of a lexicon, we could identify causal relationships through experimentation.
  • Future work: Applying causal discovery methods to lexical data.

Thank you!